A Comparative Study on Different Types of Approaches to Bengali document Categorization

نویسندگان

  • Md. Saiful Islam
  • Fazla Elahi Md Jubayer
  • Syed Ikhtiar Ahmed
چکیده

Learning. Abstract: Document categorization is a technique where the category of a document is determined. In this paper three well-known supervised learning techniques which are Support Vector Machine(SVM), Naïve Bayes(NB) and Stochastic Gradient Descent(SGD) compared for Bengali document categorization. Besides classifier, classification also depends on how feature is selected from dataset. For analyzing those classifier performances on predicting a document against twelve categories several feature selection techniques are also applied in this article namely Chi square distribution, normalized TFIDF (term frequency-inverse document frequency) with word analyzer. So, we attempt to explore the efficiency of those three-classification algorithms by using two different feature selection techniques in this article.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Qualitative and Quantitative Examination of Text Type Readabilities: A Comparative Analysis

This study compared 2 main approaches to readability assessment. Thequantitative approach applied idea density based on part of speech tagging andcompared 3 sets of text types (i.e., narrative, expository, and argumentative) withrespect to their ease of reading. The qualitative approach was done throughdeveloping questionnaires measuring intermediate EFL learners’ perceptions oncontent, motivat...

متن کامل

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

The Comparative Impact of Content-Based and Task-Based Teaching in a Critical Thinking Setting on EFL Learners’ Reading Comprehension

This study was an attempt to investigate the comparative impact of two types of teaching approaches, namely content-based (CBI) and task-based (TBLT) instruction on the reading comprehension of Iranian EFL learners. For this purpose, sixty intermediate students from a pool of eighty five students studying at a private language school were chosen using a piloted PET. The students were then rando...

متن کامل

Tashir in the Illuminations of Khamsah Nizami Tahmaspi(OR 2265): A Comparative Typology

Known to be one of the most significant illustration techniques in bibliopegy, tashir is characterized by a variety of aesthetical features over different eras. The Safavid Reign marks the summit of this art form. A magnificent inheritance of this era, the Tahmaspi Khamsah is an exquisite manuscript illuminated with the two prominent types of tashir common during the Safavid period. The tashirs...

متن کامل

The Use of WordNets for Multilingual Text Categorization: A Comparative Study

The successful use of the Princeton WordNet for Text Categorization has prompted the creation of similar WordNets in other languages as well. This paper focuses on a comparative study between two WordNet based approaches for Multilingual Text Categorization. The first relates on using machine translation to access directly the princeton WordNet while the second avoids machine translation by usi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1701.08694  شماره 

صفحات  -

تاریخ انتشار 2016